db <- read.csv("Data Export Summary.csv", row.names=1)
head(db)
## X.ray NMR EM Multiple.methods Neutron Other Total
## Protein (only) 142303 11804 5999 177 70 32 160385
## Protein/Oligosaccharide 8414 31 979 5 0 0 9429
## Protein/NA 7491 274 1986 3 0 0 9754
## Nucleic acid (only) 2368 1372 60 8 2 1 3811
## Other 149 31 3 0 0 0 183
## Oligosaccharide (only) 11 6 0 1 0 4 22
Q1: What percentage of structures in the PDB are solved by X-Ray and Electron Microscopy.
totalxray <- sum(db$X.ray)
totalem <- sum(db$EM)
total.sums <- sum(db$Total)
perc <- ( (totalxray + totalem)/total.sums ) * 100
perc
## [1] 92.47157
Q2: What proportion of structures in the PDB are protein?
propProtein <- (160385 / total.sums)
propProtein
## [1] 0.8736328
Q3: Type HIV in the PDB website search box on the home page and determine how many HIV-1 protease structures are in the current PDB?
There are 591 HIV-protease structures. It is a very important structure.
Q4: Water molecules normally have 3 atoms. Why do we see just one atom per water molecule in this structure?
We only see the oxygen atoms only because the hydrogen atoms are too small to see.
Q5: There is a conserved water molecule in the binding site. Can you identify this water molecule? What residue number does this water molecule have (see note below)?
HOH308:0
Load bio3d package
library(bio3d)
Read the pdb file
pdb <- read.pdb("1hsg") # accessing the online 1hsg file
## Note: Accessing on-line PDB file
pdb
##
## Call: read.pdb(file = "1hsg")
##
## Total Models#: 1
## Total Atoms#: 1686, XYZs#: 5058 Chains#: 2 (values: A B)
##
## Protein Atoms#: 1514 (residues/Calpha atoms#: 198)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 172 (residues: 128)
## Non-protein/nucleic resid values: [ HOH (127), MK1 (1) ]
##
## Protein sequence:
## PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
## QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNFPQITLWQRPLVTIKIGGQLKE
## ALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTP
## VNIIGRNLLTQIGCTLNF
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
Q7: How many amino acid residues are there in this pdb object?
There are 198 amino acid residues.
Q8: Name one of the two non-protein residues?
A non-protein residue is HOH, or water.
Q9: How many protein chains are in this structure?
There are 2 protein chains in this structure.
Find the attributes of the object
attributes(pdb)
## $names
## [1] "atom" "xyz" "seqres" "helix" "sheet" "calpha" "remark" "call"
##
## $class
## [1] "pdb" "sse"
head(pdb$atom)
## type eleno elety alt resid chain resno insert x y z o b
## 1 ATOM 1 N <NA> PRO A 1 <NA> 29.361 39.686 5.862 1 38.10
## 2 ATOM 2 CA <NA> PRO A 1 <NA> 30.307 38.663 5.319 1 40.62
## 3 ATOM 3 C <NA> PRO A 1 <NA> 29.760 38.071 4.022 1 42.64
## 4 ATOM 4 O <NA> PRO A 1 <NA> 28.600 38.302 3.676 1 43.40
## 5 ATOM 5 CB <NA> PRO A 1 <NA> 30.508 37.541 6.342 1 37.87
## 6 ATOM 6 CG <NA> PRO A 1 <NA> 29.296 37.591 7.162 1 38.40
## segid elesy charge
## 1 <NA> N <NA>
## 2 <NA> C <NA>
## 3 <NA> C <NA>
## 4 <NA> O <NA>
## 5 <NA> C <NA>
## 6 <NA> C <NA>